An Algorithm for High Accuracy Name Pronunciation by Parametric Speech Synthesizer
نویسنده
چکیده
Automatic and accurate pronunciation of personal names by parametric speech synthesizer has become a crucial limitation for applications within the telecommunications industry, since the technology is needed to provide new automated services such as reverse directory assistance (number to name). Within text-to-speech technology, however, it was not possible to offer such functionality. This was due to the inability of a text-to-speech device optimized for a specific language (e.g., American English) to accurately pronounce names that originate from very different language families. That is, a telephone book from virtually any section of the country will contain names from scores of languages as diverse as English and Mandarin, French and Japanese, Irish and Polish. All such non-Anglo-Saxon names have traditionally been mispronounced by a speech synthesizer resulting in gross errors and unintelligible speech. This paper describes how an algorithm for high accuracy name pronunciation was implemented in software based on a combination of cryptanalysis, statistics, and linguistics. The algorithm behind the utility is a two-stage procedure: (1) the decoding of the name to determine its etymological grouping; and (2) specific letter-to-sound rules (both segmental rules as well as stress-assignment rules) that provide the synthesizer parameters with sufficient additional information to accurately pronounce the name as would a typical speaker of American English. Default language and thresholds are settable parameters and are also described. While the complexity of the software is invisible to applications writers as well as users, this functionality now makes possible the automation of highly accurate name pronunciation by parametric speech synthesizer.
منابع مشابه
Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کاملSynthesis of names by a demisyllable-based speech synthesizer (SPOKESMAN)
Many applications for text-to-speech synthesis involve the translation of names and!or addresses. However, most commercially available synthesizers are particularly poor at synthesizing names, since the focus of their development is typically synthesizing words. Names in the United States, for example, are often pronounced in accordance with rules different from the rules of English words. The ...
متن کاملPronunciation Modeling In Speech Synthesis
This dissertation investigates the area of pronunciation modeling in speech synthesis. By pronunciation modeling, we mean architectures and principles for generating high-quality human-like pronunciations. The term pronunciation modeling has previously been applied in the context of speech recognition (e.g. Byrne et al. 1997). In that context, it describes theories and procedures for handling t...
متن کاملAutomated Pronunciation Scoring for L2 English Learners
1. Introduction This study aims at developing an automated pronunciation scoring method for second language learners of English (Hereafter, L2 learners) using both confidence scoring and classifiers. The pronunciation errors have been detected using the confidence measure from speech recognition [Franco et al. However, the accuracy of the assessment based on the confidence scores is not always ...
متن کاملPronunciation prediction with Default&Refine
The Default&Refine algorithm is a new rule-based learning algorithm that was developed as an accurate and efficient pronunciation prediction mechanism for speech processing systems. The algorithm exhibits a number of attractive properties including rapid generalisation from small training sets, good asymptotic accuracy, robustness to noise in the training data, and the production of compact rul...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computational Linguistics
دوره 17 شماره
صفحات -
تاریخ انتشار 1991